Hostexec: Enable building with cmake4 #298

HereThereBeDragons · 2025-09-16T15:19:22Z

Motivation

Commit ROCm/TheRock@267c4d9 enabled to build hostexec but it does not build with cmake 4. However, TheRock promises since ROCm/TheRock#1440 to build with cmake 4 and as such this needs to be fixed.

Technical Details

Bump the minimum required cmake version for hostexec from 3.0 to 3.20.0 to enable building with cmake4. This is the same minimum required version as the parent directory "offload" uses.

…ne (llvm#1271)

The underlying issue was that I forgot to clean the cache directory before running the test. So the test ended up running sometimes on a dirty cache yielding bad fails. Since the code is only running a single comgr action that only converts spirv->bc, the contents of the cache should be 2 files: * the bitcode * the cache timestamp

In this patch, we add a new action: AMD_COMGR_ACTION_COMPILE_SPIRV_TO_RELOCATABLE That accepts a set of .spv files, translates them to .bc files, extracts any embedded @llvm.cmdline flags, and then compiles to a set of relocatable .o files.

The underlying issue was that I forgot to clean the cache directory before running the test. So the test ended up running sometimes on a dirty cache yielding bad fails. Since the code is only running a single comgr action that only converts spirv->bc, the contents of the cache should be 2 files: * the bitcode * the cache timestamp

…lvm#1365) from row #19 in "Mainline for 6.5 Cherry-pick List" amd-staging commits: [Comgr][Cache] Fix broken test: spirv-translator-cached.cl · 51fa25b [Cache][SPIRV] Fix flacky test... again · 56cf45a

Note that this is not an NFC change because the test case `llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll` has been updated due to the recent SGPR layout change. The 32 CSR SGPRs in `callee_need_to_spill_fp_exec_to_memory` have been adjusted to reflect this update. Change-Id: I332a721e7e8feaa5491c63228ecb42759e4d979d

)

This PR updates the SGPR layout to a striped caller/callee-saved design, similar to the VGPR layout. To ensure that s30-s31 (return address), s32 (stack pointer), s33 (frame pointer), and s34 (base pointer) remain callee-saved, the striped layout starts from s40, with a stripe width of 8. The last stripe is 10 wide instead of 8 to avoid ending with a 2-wide stripe. Fixes llvm#113782. Change-Id: I6fe8fca8b70985a8775ec04d93b460333533d2bb

…) (llvm#1371)

…#128170) Fixes SWDEV-515029

llvm#1372)

For hipBinNVPtr_ and hipBinAMDPtr_ members: the destructor of the base class was not marked as virtual, but the destructor of the derived classes are. When we delete the object we do it through a pointer to the base class. So the base class destructor is called but not the one of the derived classes. This results in strange memory behaviour detected by ASAN. Solves SWDEV-516418

…1354)

Also archive the Comgr V3 Release notes, and start a new document for Comgr V4 changes. Change-Id: I25137c174bd70caafe9b3c26d3a956331e0e9dfc

…1384)

…1522)

…126058) (llvm#3162) GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in `AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load missing argument"`. This commit changes the behavior of SelDAG to instead use a zero constant. This LLVM defect was identified via the AMD Fuzzing project. Cherry-picked from bcba311 Fixes "Arg && "Attempting to load missing argument" assert in Numba from SWDEV-543227 Co-authored-by: Robert Imschweiler <[email protected]>

HIP runtime support for compressed bundle format v3 is in place, therefore switch the default compressed bundle format to v3 in compiler. This allows both compressed and decompressed fat binary size to exceed 4GB by default. Environment variable COMPRESSED_BUNDLE_FORMAT_VERSION=2 can be used for backward compatibility for older HIP runtimes not supporting v3. Fixes: SWDEV-548879

…t_fail() (llvm#144886) (llvm#3189) Modifications to reapply the commit: * Add noexcept only after C++11 on __glibcxx_assert_fail * Remove vararg version of __glibcxx_assert_fail And doc CP. Issue [SWDEV-518041](https://ontrack-internal.amd.com/browse/SWDEV-518041) & doc task [SWDEV-538485](https://ontrack-internal.amd.com/browse/SWDEV-538485) --------- Co-authored-by: Juan Manuel Martinez Caamaño <[email protected]>

… (llvm#3432)

llvm#3457)…llvm#129037) When a read(first)lane is used on a binary operator and the intrinsic is the only user of the operator, we can move the read(first)lane into the operand if the other operand is uniform. Unfortunately IC doesn't let us access UniformityAnalysis and thus we can't truly check uniformity, we have to do with a basic uniformity check which only allows constants or trivially uniform intrinsics calls. We can also do the same for unary and cast operators. Co-authored-by: Pierre van Houtryve <[email protected]>

…#3749) The workaround will be active only if the system doesn't have pcie atomics Co-authored-by: Andryeyev, German <[email protected]>

…tributor run (llvm#155246) (llvm#3772) We do not need this in the attributor, because `ST.getWavesPerEU` accounts for both the waves-per-eu and flat-workgroup-size attributes. If the waves-per-eu values are not valid, it drops them. In the attributor, we only need to propagate the values without using intermediate flat workgroup size values. Fixes SWDEV-550257. (cherry picked from commit ca03045)

…Test.cpp` (llvm#3773)

@src

…d integers. (llvm#3581) This patch extends the instruction combiner to simplify the construction of a packed scalar integer from a vector type, such as: ```llvm target datalayout = "e" define i32 @src(<4 x i8> %v) { %v.0 = extractelement <4 x i8> %v, i32 0 %z.0 = zext i8 %v.0 to i32 %v.1 = extractelement <4 x i8> %v, i32 1 %z.1 = zext i8 %v.1 to i32 %s.1 = shl i32 %z.1, 8 %x.1 = or i32 %z.0, %s.1 %v.2 = extractelement <4 x i8> %v, i32 2 %z.2 = zext i8 %v.2 to i32 %s.2 = shl i32 %z.2, 16 %x.2 = or i32 %x.1, %s.2 %v.3 = extractelement <4 x i8> %v, i32 3 %z.3 = zext i8 %v.3 to i32 %s.3 = shl i32 %z.3, 24 %x.3 = or i32 %x.2, %s.3 ret i32 %x.3 } ; =============== define i32 @tgt(<4 x i8> %v) { %x.3 = bitcast <4 x i8> %v to i32 ret i32 %x.3 } ``` Alive2 proofs (little-endian): [YKdMeg](https://alive2.llvm.org/ce/z/YKdMeg) Alive2 proofs (big-endian): [vU6iKc](https://alive2.llvm.org/ce/z/vU6iKc)

Co-authored-by: Amit Kumar Pandey <[email protected]> Co-authored-by: Hans Wennborg <[email protected]> Co-authored-by: Amit Pandey <[email protected]>

llvm#3870) …(llvm#3208) 'hsa_vmem_address_free'. Implement interception of 'hsa_amd_vmem_address_reserve_align' and 'hsa_vmem_address_free' so as to support ASan overflow errors for memory allocated via 'hipMallocManaged'. [Ticket: SWDEV-483895] --------- Co-authored-by: Amit Pandey <[email protected]>

@AlexVlx

Due to a botched merge, we currently emit volatile loads from feature predicate globals. These are never foldable, which breaks things. This does not apply to the upstream patch currently under review. Commiting on behalf of github user @AlexVlx

llvm#3577) ...(llvm#131167) Fixes SWDEV-514946 Co-authored-by: Emma Pilkington <[email protected]>

…lvm#3748) This along with IntrReadMem means that the Intrinsic only reads memory through the given argument ptr and its derivatives. This allows passes like Inliner to attach alias.scope to the call instruction as it sees that no other memory is accessed. Discovered via SWDEV-543741 --------- Co-authored-by: Matt Arsenault <[email protected]> Cherry-picked from 1d30f71 --------- Co-authored-by: choikwa <[email protected]>

…lvm#4011) Restrict to VGPR only (VRegSrc_32) for mfma scale operands to workaround a hardware design defect: For all Inline/SGPR constants, SP HW use bits [30:23] as the scale. TODO: We may still be able to allow Inline Constants/SGPR, with a proper shift, to obtain a potentially better performance. Fixes: SWDEV-548629

Co-authored-by: Thao, Vang <[email protected]>

Add reference to ROCm compiler reference, remove unused test file update link in ENV topic

Bump the minimum required cmake version from 3.0 to 3.20.0 to enable building with cmake4. This is the same minimum required version as the parent directory "offload" uses.

lamikr · 2025-09-25T17:54:19Z

I tested with the amd-llvm version and without this patch the build with cmake 4.1.0 would produce a following error:

0.8     -- Building the llvm-omp-kernel-replay tool
0.8     CMake Error at /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/compiler/amd-llvm/offload/hostexec/CMakeLists.txt:13 (cmake_minimum_required):
0.8       Compatibility with CMake < 3.5 has been removed from CMake.
0.8
0.8       Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
0.8       to tell CMake that the project requires at least <min> but has been updated
0.8       to work with policies introduced by <max> or earlier.
0.8
0.8       Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.
0.8
0.8
0.8     -- Configuring incomplete, errors occurred!
0.9     FAILED: runtimes/runtimes-stamps/runtimes-configure /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/build/compiler/amd-llvm/build/runtimes/runtimes-stamps/runtimes-configure

When this patch is applied, llvm-build worked both with the cmake 3.28.3 and with the cmake 4.1.0.

226.9   -- Building the llvm-omp-kernel-replay tool
226.9   -- Building hostexec for AMDGCN linked against libhsa
226.9   -- HSA Runtime found: /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/build/compiler/amd-llvm/build/runtimes/rocr-runtime-prefix/src/rocr-runtime-build/rocr/lib/libhsa-runtime64.so
226.9   -- HSA Runtime include: /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/rocm-systems/projects/rocr-runtime/ annruntime/hsa-runtime/inc
226.9   -- Not building hostexec for NVPTX because cuda not found
226.9      -- Building hostexec with LLVM 20.0.0git found with CLANG_TOOL /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/build/compiler/amd-llvm/build/bin/clang
226.9   -- Building DeviceRTL. Using clang: /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/build/compiler/amd-llvm/build/bin/clang, llvm-link: /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/build/compiler/amd-llvm/build/bin/llvm-link and opt: /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/build/compiler/amd-llvm/build/bin/opt
226.9   -- Building offloading runtime library libomptarget.
226.9   -- Configuring done (0.5s)
227.2   -- Generating done (0.3s)
227.2   -- Build files have been written to: /home/lamikr/own/rock/src/sdk/therock_gfx1100_v2/build/compiler/amd-llvm/build/runtimes/runtimes-bins

After that I tested the the llvm-version build both with the cmake 3.28.3 and 4.1.0 to build the rest of the rocm-stack and pytorch and kernel loading to AMD gpu's worked ok with my pytorch and triton test spps.

cmake version does not probably affect to lit-test, so they are relevant for this. They passed anyway for command:

lit amd-llvm/openmp/runtime/test/ompt

Not sure how to do more testing for this one.

marbre · 2025-09-25T20:23:13Z

Not sure how to do more testing for this one.

There is no need for further testing on our side. The LLVM team has picked this up and are on it but this will go to an internal repo first.

jmmartinez and others added 30 commits March 20, 2025 11:12

[Comgr][Cache] Enably the cache by default

4c66d60

[Comgr] Do not compute opencl-c.h hash (llvm#1269)

3f3b249

[Comgr][Cache] Fix broken test: spirv-translator-cached.cl (llvm#1270)

20fbc4b

[Comgr][Cache] Late code reviews cherry-picked from staging to mainli…

9d5047e

…ne (llvm#1271)

[Comgr][Cache] Enable the cache by default (llvm#1272)

85a5451

[Cache][SPIRV] Fix flacky test... again (llvm#1318)

ca47f24

[Comgr] Fix disassem-instruction memory corruption (llvm#1336)

842526e

[Comgr][Cache] Fix broken test: spirv-translator-cached.cl

84c7a38

[Comgr][Merge] Two dependent changes to fix spirv test (llvm#1317) (l…

a893c39

…lvm#1365) from row #19 in "Mainline for 6.5 Cherry-pick List" amd-staging commits: [Comgr][Cache] Fix broken test: spirv-translator-cached.cl · 51fa25b [Cache][SPIRV] Fix flacky test... again · 56cf45a

[AMDGPU] Auto generated check lines for two tests (llvm#1126) (llvm#1368

4a86a97

)

[AMDGPU] Change SGPR layout to striped caller/callee saved (llvm#127353…

bb5598a

…) (llvm#1371)

[Attributor] Do not optimize away externally_initialized loads. (llvm…

e6c5ed5

…#128170) Fixes SWDEV-515029

[Attributor] Do not optimize away externally_initialized loads. (#128… (

3dbccb7

llvm#1372)

[Comgr] Fix disassem-instruction memory corruption (llvm#1336) (llvm#…

92fc035

…1354)

[Comgr] Add new Action to compile SPIR-V to Relocatable (llvm#1362)

7aac4cb

Add Env var control on enabling device-to-device memory access

c27d4e8

[Comgr][V3] Increment Version number to 3.0

0475427

Also archive the Comgr V3 Release notes, and start a new document for Comgr V4 changes. Change-Id: I25137c174bd70caafe9b3c26d3a956331e0e9dfc

Add test for warning about using amdflang-new

0f51431

Emit warning when amdflang-new is invoked

0f9a104

Add Env var control on enabling device-to-device memory access (llvm#…

cdead4f

…1384)

[Comgr][V3] Increment Version number to 3.0 (llvm#1423)

28b85e3

Combined ASAN device malloc patches to eliminate false reports

aaed638

Combined ASAN device malloc patches to eliminate false reports (llvm#…

9b71db5

…1522)

Bring tgamma patches 1397 and 1429 over to mainline

af6fb56

choikwa and others added 23 commits August 5, 2025 16:08

[CUDA][HIP] capture possible ODR-used var (llvm#136645) (llvm#3443)

60f9156

[HIP] Claim --offload-compress for -M (llvm#133456) (llvm#3442)

7e0e42b

[HIP] compressed bundle format defaults to v3 (llvm#3503)

eeb5d84

Cherry-picking 71d6762

185ddcf

[AMDGPU] Ensure non-reserved CSR spilled regs are live-in (llvm#146427)…

2b07f34

… (llvm#3432)

SWDEV-465041 - Enable queue write index programming (llvm#2404) (llvm…

12fb44f

…#3749) The workaround will be active only if the system doesn't have pcie atomics Co-authored-by: Andryeyev, German <[email protected]>

[NFC][OffloadBundle] Fix compile warnings (llvm#3700)

9461df3

[NFC] Fix compile warnings in `llvm/unittests/Object/OffloadingBundle…

7757f13

…Test.cpp` (llvm#3773)

[compiler-rt]: fix CodeQL errors (llvm#3798)

cc250b8

Co-authored-by: Amit Kumar Pandey <[email protected]> Co-authored-by: Hans Wennborg <[email protected]> Co-authored-by: Amit Pandey <[email protected]>

[AMDGPU] Fix a crash by skipping DBG instrs at start of sched region … (

b2c1136

llvm#3577) ...(llvm#131167) Fixes SWDEV-514946 Co-authored-by: Emma Pilkington <[email protected]>

[Comgr] Fix memory leak in name expression API

afe89d2

Co-authored-by: Thao, Vang <[email protected]>

[hipcc] Remove PERL scripts, add GitHub repo link

1e6a516

Add reference to ROCm compiler reference, remove unused test file update link in ENV topic

Hostexec: Enable building with cmake4

f9f614c

Bump the minimum required cmake version from 3.0 to 3.20.0 to enable building with cmake4. This is the same minimum required version as the parent directory "offload" uses.

HereThereBeDragons requested a review from estewart08 September 16, 2025 15:19

HereThereBeDragons mentioned this pull request Sep 23, 2025

Hostexec: Enable building with cmake4 ROCm/TheRock#1493

Open

lamikr self-requested a review September 23, 2025 21:52

marbre removed the request for review from lamikr September 24, 2025 08:45

kzhuravl force-pushed the amd-mainline branch from 3b36fc8 to ffd4bcc Compare October 13, 2025 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hostexec: Enable building with cmake4 #298

Hostexec: Enable building with cmake4 #298

Uh oh!

HereThereBeDragons commented Sep 16, 2025

Uh oh!

lamikr commented Sep 25, 2025

Uh oh!

marbre commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

51 participants

Hostexec: Enable building with cmake4 #298

Are you sure you want to change the base?

Hostexec: Enable building with cmake4 #298

Uh oh!

Conversation

HereThereBeDragons commented Sep 16, 2025

Motivation

Technical Details

Uh oh!

lamikr commented Sep 25, 2025

Uh oh!

marbre commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

51 participants